Towards structured output prediction of enzyme function

نویسندگان

  • Katja Astikainen
  • Liisa Holm
  • Esa Pitkänen
  • Sandor Szedmak
  • Juho Rousu
چکیده

BACKGROUND In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM3) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences. RESULTS In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction. CONCLUSION Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reaction Kernels - Structured Output Prediction Approaches for Novel Enzyme Function

Abstract: Enzyme function prediction problem is usually solved using annotation transfer methods. These methods are suitable in cases where the function of the new protein is previously characterized and included in the taxonomy such as EC hierarchy. However, given a new function that is not previously described, these approaches arguably do not offer adequate support for the human expert. In t...

متن کامل

Reaction kernels: predicting enzyme functions you have never seen before

Motivation: Enzyme function prediction is an important problem in post-genomic bioinformatics. There are two general methods for solving the problem: annotation transfer from a similar annotated protein, and machine learning approaches that treat the problem as classification against a fixed taxonomy, such as Gene Ontology or the EC hierarchy. These methods are suitable in cases where the funct...

متن کامل

Output Space Search for Structured Prediction

We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a time-bounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the search. This framework can be instantiated for a wide range of search spaces and search procedures...

متن کامل

Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to first ...

متن کامل

A Structured-Outputs Method for Prediction of Protein Function

We apply the structured-output methodology to the problem of predicting the molecular function of proteins. Our results demonstrate that learning the structure of the output space yields better performance when compared to the traditional “transfer of annotation” method.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • BMC Proceedings

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2008